Method and apparatus for DTMF detection and voice mixing in the CELP parameter domain

ABSTRACT

A method and apparatus for DTMF detection and voice mixing in the code-excited linear prediction (CELP) parameter space, without fully decoding and reconstructing the speech signal. The apparatus includes a Dual Tone Multiplexed Frequency (DTMF) signal detection module and a multi-input mixing module. The DTMF signal detection module detects DTMF signals by computing characteristic features from the input CELP parameters and comparing them with known features of DTMF signals. The multi-input mixing module mixes multiple sets of input CELP parameters, that represent multiple voice signals, into a single set of CELP parameters. The mixing computation is performed by analyzing each set of input CELP parameters, determining the order of importance of the input sets, selecting a strategy for mixing the CELP parameters, and outputting the mixed CELP parameters. The method includes inputting one or more sets of CELP parameters and external commands, detecting DTMF tones, mixing multiple sets of CELP parameters and outputting the DTMF signal, if detected, and the mixed CELP parameters.

CROSS-REFERENCES TO RELATED APPLICATIONS

This patent application claims priority to U.S. Provisional PatentApplication Ser. No. 60/421,342 (Attorney Docket Number 021318-001200US)titled “Method for In-Band DTMF Detection & Generation In VoiceTranscoder,” filed Oct. 25, 2002 and U.S. Provisional Patent ApplicationSer. No. 60/421,271 (Attorney Docket Number 021318-001400US) titled“Method for Multiple Input Source Voice Transcoding,” filed Oct. 25,2002, which are both incorporated by reference for all purposes.

BACKGROUND OF INVENTION

The present invention relates generally to processing telecommunicationsignals. More particularly, the invention provides a method andapparatus for performing DTMF (i.e., Dual-Tone Multi-Frequency)detection and voice mixing in the CELP (i.e., Code Excited LinearPrediction) domain. Specifically, it relates to a method and apparatusfor detecting the presence of DTMF tones in a compressed signal from theCELP parameters, and also for mixing multiple input compressed voicesignals, represented by multiple sets of CELP parameters, into a singleset of CELP parameters. Merely by way of example, the invention has beenapplied to voice transcoding, but it would be recognized that theinvention may has a much broader range of applicability.

Telecommunications techniques have developed over the years. Recently,there have been a variety of digital voice coders developed to meetcertain bandwidth demands of different packet-networks and mobilecommunication systems. Digital voice coders provide compression of adigitized voice signal as well as reverse transformation functions.Rapid growth in diversity of networks and wireless communication systemsgenerally requires that speech signals be converted between differentcompression formats. A conventional method for such conversion is toplace two voice coders in tandem to serve a single connection. In such acase, the first compressed speech signal is decoded to a digitizedsignal through the first voice decoder, and the resulting digitizedsignal is re-encoded to a second compressed speech signal through thesecond voice encoder. Two voice coders in tandem are commonly referredto as a “tandem coding” approach. The tandem coding approach is to fullydecode the compressed signal back to a digitized signal, such as PulseCode Modulation (PCM) representation, and then re-encode the signal.This often requires a large amount of processing and incurs increaseddelays. More efficient approaches include technologies called smarttranscoding, among others.

In addition to the requirements of voice transcoding among currentdiverse networks and wireless communication systems, it is also requiredto provide functionality for advanced feature processing. A specificexample of can advanced feature is Dual Tone Multiplexed Frequency(DTMF) signal detection. DTMF signaling is widely used in telephonedialing, voice mail, electronic banking systems, even with InternetProtocol (IP) phones to key in an IP address. In telecommunicationsspeech codecs, the in-band DTMF signals are encoded to a compressedbitstream. Conventional DTMF signal detection is performed in the speechsignal space. As merely an example, the Goertzel algorithm with atwo-pole Infinite Impulse Response (IIR) type filter is widely used toextract the necessary spectral information from an input digitizedsignal and to form the basis of DTMF detection.

When DTMF signal detection is required in voice transcoding, a tandemapproach is commonly used. In this case, the input compressed speechsignal has to be decoded back to the speech domain for DTMF signaldetection, and then re-encoded to a compressed format. Since theprocessing in smart voice transcoding is performed in the CELP parameterspace, known DTMF detection methods are often not suitable. Furthermore,known smart voice transcoding methods do not include DTMF signaldetection functionality and are therefore limited.

Another specific example of an advanced feature for voice transcoding isthe ability to handle multiple input signals. If the input signals aremultiple speech signals; the voice mixer simply mixes the speech signalsand outputs the mixed speech signal. However, in a packet network or awireless communication system, the input signals are multiple compressedsignals. Furthermore, with the current diversity of packet networks andwireless communication systems, the input signals may be in variouscompression formats. The conventional voice mixing solution performsmixing of the input packets by decoding the input packets into speechsignals, mixing the speech signals, and re-encoding the mixed speechsignals into output packets. This requires significant computationalcomplexity to decode and re-encode each input compressed signal.

In an attempt to improve the voice quality produced by voice mixing forpacket networks, certain “smart” conference bridging methods have beenproposed. Although such method can provide side information and canimprove the quality of mixed voice signals, it still uses a tandemapproach that involves decoding, mixing in the speech space andre-encoding. This approach is often not suitable for a voice transcoderthat operates in the CELP parameter space without going to the speechspace.

From the above, it is seen that techniques for improved processing oftelecommunication signals are highly desired.

BRIEF SUMMARY OF THE INVENTION

According to the present invention, techniques for processingtelecommunication signals are provided. More particularly, the inventionprovides a method and apparatus for performing DTMF detection and voicemixing in the CELP domain. More specifically, it relates to a method andapparatus for detecting the presence of DTMF tones in a compressedsignal from the CELP parameters, and also for mixing multiple inputcompressed voice signals, represented by multiple sets of CELPparameters, into a single set of CELP parameters. Merely by way ofexample, the invention has been applied to voice transcoding, but itwould be recognized that the invention has a much broader range ofapplicability.

In a specific embodiment, the present invention provides a method andapparatus for advanced feature processing in voice transcoders usingCELP parameters. The apparatus receives as input one or more sets ofCELP parameters, that may have been interpolated, if required, to matchthe frame size, subframe size or other characteristic, and externalcommands. The apparatus comprises a DTMF signal detection module thatdetects DTMF signals from input CELP parameters, and a multi-inputmixing module that mixes CELP parameters from multiple CELP codecs intoa single set of CELP parameters. In a specific embodiment, themulti-input mixing module has a dynamic topology and is capable ofconfiguring different topologies according to the number of inputcompressed signals. The apparatus outputs the DTMF signal, if detected,and the mixed CELP parameters.

The DTMF signal detection module includes a DTMF feature computationunit to compute the DTMF features, DTMF feature pattern tables withstored feature data corresponding to DTMF signals, a DTMF featurecomparison unit to compare the computed features with the stored patterntables, a DTMF feature buffer to store past feature data, and a DTMFdecision unit to determine the DTMF signals.

The multi-input mixing module includes a feature detection unit todetect a plurality of speech features from each set of CELP parameters,a sorting unit to rank the importance of each set of CELP parameters, amixing decision unit to determine the mixing strategy, and a mixingcomputation unit to perform the mixing of multiple sets of CELPparameters.

The invention provides a method for advanced feature processing in theCELP parameter space. The method includes receiving one or more sets ofCELP parameters that may have been interpolated to match the frame size,subframe size or other characteristic and external commands; detectingDTMF tones, mixing multiple sets of CELP parameters, and outputting thedetected DTMF signal and mixed CELP parameters.

According to an alternative specific embodiment, the present inventionprovides a method for detecting DTMF signals in the CELP parameter spaceThe method includes computing features for DTMF detection from CELPparameters; comparing features with pre-computed DTMF feature data;checking the states of DTMF detection and features in previoussubframes; determining the DTMF signals according to the DTMF signalspecifications; updating the states and feature parameters of previoussubframes; and outputting the detected DTMF digit.

In yet an alternative specific embodiment, the invention provides amethod for mixing multiple sets of input CELP parameters. The methodincludes receiving multiple sets of CELP parameters; mixing sets of CELPparameters according to a chosen mixing strategy; and outputting themixed CELP parameters. The method of mixing multiple sets of input CELPparameters into a single set of mixed CELP parameters further comprisescomputing signal feature parameters required for determining importanceof each input; arranging the order of importance of the multiple sets ofinput CELP parameters according to the feature parameter computationresults; considering priorities from external control commands;selecting the inputs that are mixed; and computing the mixed CELPparameters from selected inputs.

In an alternative specific embodiment, the invention provides anapparatus for feature processing of telecommunications signals. Theapparatus is adapted to operate in a CELP domain without decoding to aspeech signal domain. The apparatus has a dual-tone modulation frequency(DTMF) signal detection module. The dual-tone modulation frequency(DTMF) signal detection module is adapted to determine one or more DTMFtones based upon at least one or more input CELP parameters, and theDTMF signal detection module is also adapted to output the one or moreDTMF signals if determined.

In yet an alternative embodiment, the invention provides an apparatusfor feature processing of telecommunications signals. The apparatus isadapted to operate in a CELP domain without decoding to a speech signaldomain. The apparatus has a multi-input mixing module coupled to theDTMF signal detection module. The multi-input mixing module is adaptedto process CELP parameters from more than one CELP-based codecs,representing respective more than one voice signals, into a single setof CELP parameters.

Numerous benefits exist with the present invention over conventionaltechniques. In a specific embodiment, the invention provides an easy wayof detecting DTMF signals without converting CELP information back intothe speech domain. Additionally, the invention can be provided usingconventional hardware and software. In certain embodiments, theinvention also provides for additional advanced modules that can becoupled to a transcoding technology. Depending upon the embodiment, oneor more of these benefits or features can be achieved. These and otherbenefits are described throughout the present specification and moreparticularly below.

The accompanying drawings, which are incorporated in and form part ofthe specification, illustrate embodiments of the invention and, togetherwith the description, serves to explain the principles of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

The objects, features, and advantages of the present invention, whichare believed to be novel, are set forth with particularity in theappended claims. The present invention, both as to its organization andmanner of operation, together with further objects and advantages, maybest be understood by reference to the following description, taken inconnection with the accompanying drawings.

FIG. 1 is a simplified block diagram representation of an apparatus forDTMF detection and multi-input mixing in the CELP parameter domainaccording to an embodiment of the present invention.

FIG. 2 illustrates DTMF signal frequency categorization according to anembodiment of the present invention.

FIG. 3 is a simplified block diagram representation of an apparatus forDTMF signal detection according to an embodiment of the presentinvention.

FIG. 4 is a simplified flowchart of a method for DTMF signal detectionusing CELP parameters according to an embodiment of the presentinvention.

FIG. 5 is a simplified block diagram representation of DTMF detectionand Multi-input mixing within a smart voice transcoder according to anembodiment of the present invention.

FIG. 6 is a simplified block diagram representation of DTMF detectionmodule in voice transcoding between voice codec GSM-AMR and G.723.1according to an embodiment of the present invention.

FIG. 7 illustrates a LSP representation of DTMF signals from an inputGSM-AMR codec bitstream according to an embodiment of the presentinvention.

FIG. 8 illustrates a LSP representation of DTMF signals from an inputG.723.1 codec bitstream according to an embodiment of the presentinvention.

FIG. 9 is a schematic diagram of a communication link connecting threespeakers with a multi-input mixer according to an embodiment of thepresent invention.

FIG. 10 is a simplified diagram of conventional multi-input mixing amongspeakers with compression voice codec formats.

FIG. 11 is a simplified block diagram representation of an apparatus ofa multi-input mixing module according to an embodiment of the presentinvention.

FIG. 12 is a flowchart of a multi-input mixing method according to anembodiment of the present invention.

FIG. 13 is a simplified block diagram representation of an apparatus ofmulti-input mixing within a voice transcoder according to an embodimentof the present invention.

FIG. 14 is a block diagram representation of an apparatus for amulti-input mixer within a voice transcoder with different voice codecoutputs according to an embodiment of the present invention.

FIG. 15 is a schematic diagram of a four-party conference among fourdifferent participants with different voice codec formats according toan embodiment of the present invention.

FIG. 16 illustrates frame size difference among voice codecs G.729,GSM-AMR and G.723.1 according to an embodiment of the present invention.

FIG. 17 is a block diagram of an embodiment for a four-party multi-inputmixing system within voice transcoding according to an embodiment of thepresent invention.

DETAILED DESCRIPTION OF THE INVENTION

According to the present invention, techniques for processingtelecommunication signals are provided. More particularly, the inventionprovides a method and apparatus for performing DTMF detection and voicemixing in the CELP domain. More specifically, it relates to a method andapparatus for detecting the presence of DTMF tones in a compressedsignal from the CELP parameters, and also for mixing multiple inputcompressed voice signals, represented by multiple sets of CELPparameters, into a single set of CELP parameters. Merely by way ofexample, the invention has been applied to voice transcoding, but itwould be recognized that the invention has a much broader range ofapplicability.

FIG. 1 is a block diagram illustrating an advanced feature processingmodule 100. Preferably, the module has a DTMF detection module andmulti-input mixing module according to an embodiment of the presentinvention. One or more sets of CELP parameters that were derived byunpacking the bitstreams transmitted by one or more CELP-based codecs,as well as external commands, are received as input. The outputs are theDTMF signal, if detected, and the mixed CELP parameters. Advancedfeature processing is capable of different configurations or topologiesin different applications. Additional processing modules may be includedin the advanced feature processing module, the DTMF detection module maybe omitted, or the multi-input mixing module may be omitted.

Preferably, the dual-tone modulation frequency (DTMF) signal detectionmodule is adapted to determine one or more DTMF tones based upon atleast one or more input CELP parameters (e.g., silence descriptorframes), and the DTMF signal detection module is also adapted to outputthe one or more DTMF signals if determined. Preferably, the multi-inputmixing module is adapted to process CELP parameters from more than oneCELP-based codecs, representing respective more than one voice signals,into a single set of CELP parameters.

DTMF signaling is widely used in telephone dialing, voice mail,electronic banking systems, even with IP phones to key in an IP address.In many standardized telecommunication speech codecs, the in-band DTMFsignals are encoded to a CELP-based bitstream during voice compression.Further details are described throughout the present specification andmore particularly below.

A DTMF signal 200 corresponds to one of sixteen touchtone digits (0-9,A-D, # and *) shown in FIG. 2. The DTMF signal has a low-frequency toneand a high frequency tone. There are four low frequencies and four highfrequencies that are possible. The horizontal rows represent the lowfrequencies and the vertical columns represent the high frequencies. Thelow frequencies are: 697, 770, 852 and 941 Hz. The high frequencies are:1209, 1336, 1477 and 1633 Hz. Thus, each of the sixteen DTMF signals isuniquely identified according to certain embodiments.

In general, the DTMF algorithm should respond to signals whosefrequencies are within certain tolerances. Somewhat wider tolerances mayalso be acceptable, however wider limits may increase susceptibility tonoise and may result in applying digit simulation to speech. Also, theDTMF algorithm should provide proper reception of signals whose powerlevels are within the acceptable range. Note that the sending amplitudeand transmission attenuation may be different for different frequencies.

Furthermore, the DTMF algorithm should recognize signals whose durationexceeds the minimum expected value from subscribers. To guard againstfalse signal indications, the DTMF algorithm should not respond tosignals whose duration is less than the specified maximum value.Similarly, pause intervals greater than a specified minimum value shouldbe recognized by the DTMF algorithm. To minimize spurious glitches orerrors, for instance, double-registration of a signal if reception isinterrupted by a short break in transmission or by a noise pulse, andalso interruptions shorter than a specified maximum value, must not berecognized.

FIG. 3 illustrates the DTMF detection module 300 in detail. This diagramis merely an example, which should not unduly limit the scope of theclaims herein. One of ordinary skill in the art would recognize manyvariations, alternatives, and modifications. The DTMF detection moduletakes the CELP parameters and external control commands as input. TheDTMF detection module comprises a DTMF feature parameter generationsub-module that computes the DTMF signal characteristic features fromCELP parameters, a pre-defined look-up table that stores feature datacorresponding to each DTMF signal, a comparison sub-module that computesthe similarities between input feature parameters and look-up tables, aDTMF decision sub-module that determines DTMF signals through afinite-state machine, and a buffer that stores the data of previoussubframes. As an example, DTMF signal characteristic feature parameterscan be signal energy information, pitch information and spectruminformation. Such information can be obtained from input CELPparameters. The comparison sub-module checks the input signals bymatching input feature parameters with look-up tables. If the matchingresults are above a certain threshold, the potential DTMF digits will beoutput to the DTMF decision sub-module. The DTMF decision sub-modulechecks previous states against the DTMF signal requirementspecifications to determine whether a DTMF tone is present.

Preferably, the dual-tone modulation frequency (DTMF) signal detectionmodule has a DTMF feature computation unit capable of receiving the oneor more CELP parameters and external commands and computing one or moreDTMF features. The module also has one or more DTMF feature patterntables having one or more specific feature data corresponding to the oneor more DTMF signals. A DTMF feature comparison unit is also included.The DTMF feature comparison unit is adapted to process the one or moreDTMF features derived from the DTMF feature computation unit with theone or more specific feature data in DTMF feature pattern tables toidentify one or more DTMF specific signals and to classify the one ormore DTMF specific signals. A DTMF feature buffer is included. Thefeature buffer is capable of storing the one or more DTMF featureparameters and the one or more DTMF classification data of one or moreprevious sub-frames or frames. Additionally, the module includes a DTMFdecision unit capable of determining the one or more DTMF signals fromDTMF classification data of a current and one or more previoussub-frames or frames according to one or more DTMF specifications andsending out the DTMF determined signals. Preferably, the DTMF featurecomputation unit processes the one or more DTMF features using at leastone or more of linear prediction parameters information, pitchinformation, and energy information. The DTMF feature pattern tableshave specific pre-computed feature data associated from CELP parameterscorresponding to the one or more DTMF signals. In certain embodiments,the DTMF feature comparison unit classifies DTMF specific signalscorresponding to 16 digits of “1”, “2”, “3”, “4”, “5”, “6”, “7”, “8”,“9”, “0”, “A”, “B”, “C”, “D”, “#”, and “*” according to the internaltelecommunication unit (ITU) specification. Depending upon theembodiment, the DTMF decision unit further comprises of a logical statemachine and DTMF signal criteria to determine the one or more DTMFsignals and one or more specific digits. These and other features aredescribed throughout the present specification and more particularlybelow.

FIG. 4 illustrates a flowchart diagram of the DTMF detection algorithm400. Firstly, from the CELP parameters, such as Line Spectral Pairs(LSP), pitch lag, and gains of the input codec, the DTMF features arecomputed. Secondly, the computed features are compared with features inpredefined tables for the sixteen possible DTMF signals. If there is nomatch, the DTMF detect flag is reset and no DTMF signal state isreached. An update of all necessary data takes place. If there isanother input subframe, the detection algorithm continues, otherwise thedetection algorithm ends. If there is a DTMF match, the DTMF signalrequirement specification is checked against the potential detectionresults. If it complies, the DTMF flag is set, and the DTMF digit issignaled to the output. Again an update of all necessary data takesplace and if there is another subframe, the detection algorithmcontinues, otherwise the detection algorithm ends. The detectionalgorithm completely operates in the CELP coding parameter space and isperformed for every input subframe.

An application of advanced feature processing is in voice transcodingbetween two Code Excited Linear Prediction (CELP) based voice codecs asshown in the block diagram 500 of FIG. 5. The source codec unpackermodule unpacks the source codec bitstream to produce the CELPparameters. The CELP parameter interpolation module interpolates theCELP parameters to match the frame length and subframe length of thedestination codec if required. The interpolated CELP parameters aremapped to encoded destination codec parameters. The destination codecpacker packs the encoded parameters to the bitstream in the requiredformat. In addition to this typical voice transcoding approach, anadvanced feature processing module 501 is added to the voice transcoder.The advanced feature processing module takes the interpolated CELPparameters as its input, and computes desired features. The resultingfeatures are either output in parallel to the bitstream of destinationcodec (transmitted out-of-band), or passed to the voice transcoder forenhanced processing (transmitted in-band), or transmitted both in-bandand out-of-band. The DTMF detection algorithm works in parallel withvoice transcoding i.e. it does not interrupt the main stream voicetranscoding.

As an example, the DTMF signal detection is applied to the voicetranscoder between the GSM-AMR voice codec and the G.723.1 voice codec.Examples of transcoding methods and systems can be found at Method &Apparatus for Transcoding Video & Speech Signals, in the name of Jabri,Marwan, Anwar, PCT/US02/08218 filed Mar. 13, 2002 and A TranscodingMethod And System Between CELP-Based Speech Codes in the names of Jabri,Marwan Anwar, Wang, Jianwei, Gould, Stephen PCT/US03/00649 filed Jan. 8,2003, commonly owned and hereby incorporated by reference for allpurposes. In a specific embodiment, the DTMF signal detection module andthe multi-input module are incorporated within a CELP-based voicetranscoder.

FIG. 6 shows a simplified block diagram of a full-duplex GSM-AMR

G.723.1 voice transcoder 600 enabled with the advanced feature of DTMFdetection. Using the DTMF signal detection procedure of the presentinvention, DTMF detection can be performed on GSM-AMR input CELPparameters in parallel to the voice transcoding process to a G.723.1codec bitstream. First, a 20 ms frame input GSM-AMR bitstream isunpacked to CELP parameters for four 5 ms subframes. These four GSM-AMRsubframes with another two GSM-AMR subframes from the CELP parameters ofthe next 20 ms frame are interpolated into one G.723.1 frame of CELPparameters. The resulting interpolated CELP parameters are mapped andpacked to the bitstream for one G.723.1 frame. In parallel to thisprocedure, the CELP parameters of the four GSM-AMR subframes are fed toa DTMF detection module inside the voice transcoder. The DTMF detectionmodule computes the DTMF features from each subframe of CELP parameters,compares them with pre-defined DTMF feature data, and determines whetherthe input compressed speech signal contains a DTMF signal according tothe minimum requirements of the DTMF specification. If the input featureparameters match the pre-defined DTMF data in the look-up tables, andsatisfy the requirements of DTMF signals through the describedfinite-state machine, the detected DTMF digit is signaled to the output.If the DTMF detection module is enabled in the voice transcoder fromGSM-AMR to G.723.1, the DTMF detection algorithm executes on everyincoming GSM-AMR frame. Thus, it is able to detect DTMF signals from theinput CELP parameters at all times during voice transcoding.

Similarly, in transcoding from G.723.1 to GSM-AMR, the DTMF detectioncomputation can be applied on the incoming G.723.1 frames. Slightvariations will exist due to the different subframe size and frame sizeof the GSM-AMR and G.723.1 codecs.

In order to show that the unique specific features of DTMF signals canbe computed from CELP parameters, FIG. 7 illustrates the Line SpectralPairs (LSP) parameters 700 of incoming GSM-AMR frames at the rate of12.2 kbps for the possible DTMF digits. FIG. 8 illustrates the LineSpectral Pairs parameters 800 of incoming G.723.1 frames at the rate of6.3 kbps for the possible DTMF digits. Similarly, the unpacked CELPpitch lag and gain information are used to detect and classify the DTMFdigits.

Note, that the GSM-AMR codec can operate in eight different modes ofspeech compression and the G.723.1 codec can operate in two differentmodes of speech compression. The DTMF detection algorithm illustrated inFIG. 5 applies to any rate of the GSM-AMR and G.723.1 codecs. Thealgorithm also applies to any other CELP-based voice codecs.

FIG. 9 is a schematic depicting a multi-input mixer 900 that hasmultiple compressed voice signals as input. The compressed signals mayhave been encoded using different codec standards. The multi-input mixermixes the speech information from the multiple inputs, and outputs mixedcompressed signals.

In a specific embodiment, the multi-input mixing module comprises afeature detection unit capable of receiving one or more sets of CELPparameters and external commands and detecting a plurality of speechfeatures. In a specific embodiment, the feature detection unit isadapted to determine a plurality of speech signal features, thedetermining including classifying an input represented by the CELPparameters as active speech, silence descriptor frames, or discontinuoustransmission frames. In other embodiments, the feature detection unitdetermines a plurality of speech signal features including one or moreof LSP spectrum information, pitch information, fixed-codebookinformation, energy information. The module also has a sorting unitcapable of processing the detected features of the more than one set ofCELP parameters and ranking an order of importance for each set of CELPparameters based upon a predetermined criteria. The sorting unitreceives data from the feature detection unit, and arranges the order ofimportance of the multiple sets of CELP parameters based upon thepredetermined criteria according to certain embodiments. In a specificembodiment, the more than one set of CELP parameters can becharacterized by more than one voice compression standards, or two setsof CELP parameters can be characterized by the same voice compressionstandard or all sets of CELP parameters can be characterized by the samevoice compression standard. The more than one set of CELP parameters mayhave been interpolated if they have been generated using different voicecompression standards to match the frame size, subframe size or othercharacteristic in certain embodiments. Additionally, the module has amixing decision unit capable of determining a processing strategy,selecting some or all sets of CELP parameters for processing, andcontrolling the processing of the more than one set of CELP parameters.According to a specific embodiment, the mixing decision unit receivesdata from the sorting unit and external control commands to determinethe sets of CELP parameters that are processed. A mixing computationunit capable of processing more than one set of CELP parameters isincluded. Preferably, the mixing computation unit can pass through asingle set of CELP parameters, or select and mix multiple sets of CELPparameters, or send silence description data information.

Conventional voice mixing solutions handle voice codec inputs in atandem approach. The speech information contained in the multiplebitstream inputs is obtained and decoded. Voice mixing of the inputs isperformed in the speech domain, and the mixed speech is then re-encoded.An example of a voice mixing application is a conference bridge whichhandles multiple channels during a conference call. In a conference callscenario, if the participants have different voice codecs, there-encoding process involves multiple specific encoding processes forthe mixed speech.

FIG. 10 illustrates a conventional voice mixing solution 1000 in atandem approach. Speaker 1 sends speech information in codec Acompression format, and speaker 2 sends speech information in codec Bcompression format. The listener accepts codec C voice compressionformat. In order to mix speech from speakers 1 and 2, and to send mixedspeech to the listener, the voice mixer requires decoders A and B toconvert two input voice compression formats to the same speech domain,and then it mixes the input speech signals. Before sending the mixedsignal, it needs to be re-encoded to codec C format.

It is obvious that a tandem-based approach to voice mixing is notefficient. It involves the complete decoding of the incoming bitstreamsto speech signals, the combining of these signals in the speech space,and the complete encoding of the mixed speech signals to the outgoingbitstreams.

FIG. 11 is a block diagram further illustrating the multi-input mixingmodule 1100 in the described embodiment according to the presentinvention. The multi-input mixing module comprises a feature detectionsub-module, a sorting sub-module, a mixing decision sub-module and amixing computation sub-module. The feature detection sub-module computesspeech signal features from each set of CELP parameters. If the CELPparameters are produced from different CELP compression standards,interpolation of the CELP parameters is required to match the framesize, subframe size, or other characteristic. The signal featurescomputed include signal energy, frame type and signal type (i.e. activespeech, inactive speech, discontinuous transmission). The sortingsub-module computes the importance of each set of CELP parameters fromcomputed signal features and sorts the input sets of CELP parametersaccording to their importance. The mixing decision sub-module combinesthe factors from the sorting results, external commands, and previousmixing decision to determine the mixing strategy. The decision can bethat no sets of CELP parameters are selected, only one set of CELPparameters is selected, part of some sets of CELP parameters areselected, or all sets of CELP parameters are selected. The mixingcomputation sub-module mixes the selected sets of CELP parameters andoutputs the mixed CELP parameters.

As an example, the multi-input mixing module is used to mix inputchannels during a conference call. There are three participants, labeled1, 2, 3, joining the call, and only participant 1 is talking at acertain time. The mixing decision for the direction to participant 1 isthat no input channels are selected, as participants 2 and 3 are silent.The mixing decision for the directions to participants 2 and 3 is thatonly the channel from participant 1 is selected, as there is only onechannel detected as containing active speech.

If both participants 1 and 2 are talking at a certain time, the mixingdecision to participant 3 is that input channels 1 and 2 are selected.However, the mixing decision for the directions to participants 1 and 2is that only single channel is selected as the input channel fromparticipant 3 is silent. The mixing module can be configured to not mixa participant's speech to itself in order to avoid unwanted echoes.

There are several mixing computation approaches. As an example, formixing two inputs, A and B, the total subframe excitation energy foreach incoming stream is given by the expressions:${Ex}_{A} = {\sum\limits_{n = 1}^{N}{{\mathbb{e}}_{A}^{2}(n)}}$ and${Ex}_{B} = {\sum\limits_{n = 1}^{N}{{\mathbb{e}}_{B}^{2}(n)}}$

where e_(A)(n) and e_(b)(n) are excitation vectors of inputs A and Brespectively, N is the subframe size of the destination codec, andEx_(A) and Ex_(B) are energies of inputs A and B respectively.

The pitch lag can be derived as ${PL}_{mix} = \left\{ \begin{matrix}{PL}_{A} & {{Ex}_{A} \geq {Ex}_{B}} \\{PL}_{B} & {otherwise}\end{matrix} \right.$

where PL_(A) and PL_(B) are pitch lags of inputs A and B respectively,PL_(mix) is the pitch lag of mixed signal.

There are a few different methods for the creation of the new LSPparameters. The first of these involves converting LSP parameters tospectrum parameters, averaging the spectrum parameters according tosubframe energy, and converting back from spectrum parameters to LSPparameters. The averaging of spectrum parameters is shown in theequation below,${LSF}_{mix} = \frac{{{LSF}_{A} \cdot {Ex}_{A}} + {{LSF}_{B} \cdot {Ex}_{B}}}{{Ex}_{A} + {Ex}_{B}}$

where LSF_(A) and LSF_(B) are spectrum parameters of input A and Brespectively, and LSF_(mix) are the spectrum parameters of the mixedsignal.

Another method would be to reintroduce the LSP contribution to theindividual excitation signals, to combine the filtered excitationsignals and then to recalculate the LSP parameters and resultantexcitation.

Another method involves ignoring the LSP parameters of the lower energyinputs, and only using the LSP parameters of the higher energy inputs,or based on some control parameters, such as channel priority.

Similar to the LSP mixing computation, the mixed excitation parameterscan be computed by a few different methods. They can be obtained byaveraging excitation parameters according to subframe energy,re-calculating them using mixed LSP parameters, or only using theexcitation of the highest energy input.

In many scenarios, such as teleconferencing, not all of the sets of CELPparameters will represent active speech. In this case, the CELPparameters represent silence description frames. These frames areignored. In other words, the only sets of CELP parameters that are mixedare those representing signals which contain speech. This reduces theamount of computations as well as rejects noise transmitted in sets ofCELP parameters that do not represent active speech.

FIG. 12 illustrates a flowchart of the CELP domain multi-input mixingmethod 1200. It involves performing signal feature computation on eachset of CELP parameters; arranging the order of importance of the sets ofCELP parameters according to the results of the feature computation;checking any priorities specified by external commands; determining thesets of CELP parameters that are going to be mixed according to theirimportance and priority; mixing the selected sets of CELP parameters;and finally outputting the mixed CELP parameters.

There are mainly three types of mixing strategies. In the first case,whereby none of the sets of CELP parameters represent active speech, themixing computation outputs silence frame descriptor or discontinuoustransmission information. In the second case, whereby only one set ofCELP parameters represents active speech, or only one set of CELPparameters is selected for mixing, the mixing computation outputs theselected CELP parameters as the mixed result. In the third case, wherebymore than one set of CELP parameters is selected for mixing, the mixingcomputation mixes the selected sets of CELP parameters and outputs themixed result.

FIG. 13 illustrates a block diagram of an embodiment of multi-inputmixing 1300 in the CELP domain within a voice transcoder according tothe present invention. The voice transcoder with multi-input mixingconnects more than two participants. As an example, the multi-inputmixing system connects three participants. In order to perform mixing oftwo source codec input compressed speech signals and transcode to adestination codec format, the multi-input mixing system comprises asource codec unpacker module that unpacks the first input bitstream datato its CELP parameters; another source codec unpacker module thatunpacks the second input bitstream to its CELP parameters; aninterpolation module that converts the first source codec CELPparameters to interpolated CELP parameters that match the frame andsubframe size of the destination codec; another interpolation modulethat converts the second source codec CELP parameters to theinterpolated CELP parameters that match the frame and subframe size ofthe destination codec; a mixing module that mixes the interpolated CELPparameters from two inputs and sends the mixed CELP parameters to thenext stage; a destination codec mapping module that converts the mixedCELP parameters to quantized CELP parameters according to thedestination codec; and a destination codec packer module that convertsthe quantized CELP parameters into a bitstream according to thedestination codec standard.

According to the described embodiment, the incoming bitstreams are notfully decoded to the speech space, but rather they are mixed in the CELPparameter space. This offers the advantage of considerably lowercomputation requirements, since the incoming bitstreams are not fullydecoded to speech signals and fully re-encoded again.

FIG. 14 illustrates a block diagram of another configuration of amulti-input mixer 1400 in voice transcoding. A mixed compressed voicesignal is required to be sent to two destination codecs with differentframe sizes.

FIG. 15 depicts an exemplary voice transcoder 1500 with multi-inputmixer used in a conference call among voice-over-IP packet networks andwireless communication systems. There are four participants joining theconference call. Two participants are from packet networks, and twoparticipants are from wireless communication systems. All voice inputsignals are in compressed voice formats. These formats are different.They are generated by voice codecs G.729, G.723.1 and GSM-AMR.Participants A and B within packet networks use G.729 codec and G.723.1codec separately, and participants C and D within wireless communicationsystems use GSM-AMR codec.

FIG. 16 shows the difference in frame size and subframe size among threevoice codecs G.729, GSM-AMR, and G.723.1 1600. These three voice codecshave different size frame lengths. G.729 codec has a frame length of 10ms. GSM-AMR codec has a frame length of 20 ms. G.723.1 has frame lengthof 30 ms. In addition, G.729 has two subframes per frame, while GSM-AMRand G.723.1 have four subframes per frame.

FIG. 17 illustrates a block diagram of voice transcoding with amulti-input mixer 1700 for all directions between the codecs G.729,G.723.1 and GSM-AMR according to the present invention. Each connectionto a participant has a path for both input and output bitstreams. Hence,for each codec standard the transcoder includes an unpacker module and apacker module to handle input and output bitstreams, a mixing module tomix the speech information of all participants other than that of theparticipant at the destination codec, and a specific mapping module toconvert mixed CELP parameters to quantized CELP parameters. As there arethree different codecs G.723.1, GSM-AMR and G.729 used in the conferencecall, each connection requires two interpolation modules following anunpacker module. The two interpolation modules interpolate source codecCELP parameters to interpolated CELP parameters which match the framesize, subframe size and other characteristic of the other destinationcodecs. For an example, an input bitstream from participant A is inG.729 codec format. To participant A, the destination codecs are G.723.1for participant B, and GSM-AMR for participants C and D. The connectionof G.729 requires an interpolation module G.729->AMR to convert G.729CELP parameters to AMR CELP parameters, and another interpolation moduleG.729->G.723.1 to convert G.729 CELP parameters to G.723.1 CELPparameters. Thus according to the description of multi-input mixingmethods above, the system can perform voice transcoding with multi-inputmixing functionality without requiring full decoding and re-encodingprocesses. Depending upon the embodiment, there can be other variations,modifications, and alternatives. Certain examples of other CELPtranscoders can be found throughout the present specification and moreparticularly below.

The invention of DTMF signal detection and multi-input mixing in theCELP domain described in this document is generic to CELP parametersgenerated by all CELP based voice codecs such as codecs G.723.1,GSM-AMR, EVRC, G.728, G.729, G.729A, QCELP, MPEG-4 CELP, SMV, AMR-WB,VMR and any voice codecs that makes use of code-excited linearprediction voice coding.

The previous description of the preferred embodiment is provided toenable any person skilled in the art to make or use the presentinvention. The various modifications to these embodiments will bereadily apparent to those skilled in the art, and the generic principlesdefined herein may be applied to other embodiments without the use ofthe inventive faculty. Thus, the present invention is not intended to belimited to the embodiments shown herein but is to be accorded the widestscope consistent with the principles and novel features disclosed herein

1.-9. (canceled)
 10. The apparatus of claim 35, further comprising a transcoding module coupled to the DTMF detection module.
 11. (canceled)
 12. The apparatus of claim 35, wherein the DTMF signal detection module is provided in an advanced processing module, the advanced processing module being coupled to a transcoding module. 13.-14. (canceled)
 15. The apparatus of claim 35 wherein the DTMF signal detection module is incorporated within a CELP-based voice transcoder. 16.-21. (canceled)
 22. The method of claim 36, wherein the CELP parameters include one or more of LSP information, pitch information, excitation vector information, energy information, fixed-codebook information, and silence description information. 23.-34. (canceled)
 35. An apparatus for processing telecommunications signals, the apparatus comprising: a dual-tone modulation frequency (DTMF) signal detection module, the dual-tone modulation frequency (DTMF) signal detection module being adapted to: receive an input signal in a CELP-based domain, the input signal being represented by one or more input CELP parameters; process information associated with the one or more input CELP parameters; determine one or more DTMF tones based upon at least information associated with the one or more input CELP parameters; output the one or more DTMF tones if determined; wherein the one or more DTMF tones are associated with the input signal in the CELP-based domain.
 36. A method for processing telecommunications signals in a CELP based domain, the method comprising: receiving an input signal in a CELP-based domain, the input signal being represented by one or more input CELP parameters; processing information associated with the one or more input CELP parameters; determining one or more DTMF tones based upon at least information associated with the one or more input CELP parameters; outputting the one or more DTMF tones if determined. wherein the one or more DTMF tones are associated with the input signal in the CELP-based domain.
 37. An apparatus for processing telecommunications signals, the apparatus being adapted to operate in a CELP based domain without decoding to a speech signal domain, the apparatus comprising: a multi-input mixing module, the multi-input mixing module being adapted to: receive one or more sets of CELP parameters from one or more CELP-based codecs, the one or more sets of CELP parameters representing respectively one or more signals; process the received one or more sets of CELP parameters into a single set of CELP parameters; output the single set of CELP parameters, the single set of CELP parameters representing a composite signal; wherein the one or more sets of CELP parameters are processed into the single set of CELP parameters without decoding the one or more sets of CELP parameters into the speech signal domain, without mixing the one or more signals into the composite signal in the speech signal domain, and without encoding the composite signal from the speech signal domain into a CELP based domain.
 38. The apparatus of claim 37, further comprising a transcoding module coupled to the multi-input mixing module.
 39. The apparatus of claim 37, wherein the multi-input mixing module is provided in an advanced processing module, the advanced processing module being coupled to a transcoding module.
 40. The apparatus of claim 37 wherein the multi-input module is incorporated within a CELP-based voice transcoder.
 41. A method for processing telecommunications signals in a CELP based domain, the method comprising: receiving one or more compressed signals from one or more CELP-based coders, the one or more compressed signals including respectively one or more sets of CELP parameters; processing the one or more compressed signals into a composite signal using the one or more sets of CELP parameters, the processing the one or more compressed signals including processing the one or more sets of CELP parameters into a single set of CELP parameters; outputting the composite signal, the composite signal including the single set of CELP parameters; wherein: the processing the one or more compressed signals into a composite signal does not include decoding the one or more sets of CELP parameters into a speech signal domain; the processing the one or more compressed signals into a composite signal does not include mixing the one or more compressed signals into the composite signal in the speech signal domain; the processing the one or more compressed signals into a composite signal does not include encoding the composite signal from the speech signal domain into the CELP based domain.
 42. The method of claim 41, wherein the CELP parameters includes one or more of LSP information, pitch information, excitation vector information, energy information, fixed-codebook information, and silence description information.
 43. The method of claim 41, wherein the processing of multiple sets of CELP parameters is capable of mixing the CELP parameters of more than two input codecs. 